Goto

Collaborating Authors

 time pressure


SimClinician: A Multimodal Simulation Testbed for Reliable Psychologist AI Collaboration in Mental Health Diagnosis

arXiv.org Artificial Intelligence

AI based mental health diagnosis is often judged by benchmark accuracy, yet in practice its value depends on how psychologists respond whether they accept, adjust, or reject AI suggestions. Mental health makes this especially challenging: decisions are continuous and shaped by cues in tone, pauses, word choice, and nonverbal behaviors of patients. Current research rarely examines how AI diagnosis interface design influences these choices, leaving little basis for reliable testing before live studies. We present SimClinician, an interactive simulation platform, to transform patient data into psychologist AI collaborative diagnosis. Contributions include: (1) a dashboard integrating audio, text, and gaze-expression patterns; (2) an avatar module rendering de-identified dynamics for analysis; (3) a decision layer that maps AI outputs to multimodal evidence, letting psychologists review AI reasoning, and enter a diagnosis. Tested on the E-DAIC corpus (276 clinical interviews, expanded to 480,000 simulations), SimClinician shows that a confirmation step raises acceptance by 23%, keeping escalations below 9%, and maintaining smooth interaction flow.


Real-Time Reasoning Agents in Evolving Environments

arXiv.org Artificial Intelligence

Agents in the real world must make not only logical but also timely judgments. This requires continuous awareness of the dynamic environment: hazards emerge, opportunities arise, and other agents act, while the agent's reasoning is still unfolding. Despite advances in language model reasoning, existing approaches fail to account for this dynamic nature. We introduce real-time reasoning as a new problem formulation for agents in evolving environments and build Real-Time Reasoning Gym to demonstrate it. We study two paradigms for deploying language models in agents: (1) reactive agents, which employ language models with bounded reasoning computation for rapid responses, and (2) planning agents, which allow extended reasoning computation for complex problems. Our experiments show that even state-of-the-art models struggle with making logical and timely judgments in either paradigm. To address this limitation, we propose AgileThinker, which simultaneously engages both reasoning paradigms. AgileThinker consistently outperforms agents engaging only one reasoning paradigm as the task difficulty and time pressure rise, effectively balancing reasoning depth and response latency. Our work establishes real-time reasoning as a critical testbed for developing practical agents and provides a foundation for research in temporally constrained AI systems, highlighting a path toward real-time capable agents.


AISysRev -- LLM-based Tool for Title-abstract Screening

arXiv.org Artificial Intelligence

Systematic reviews are a standard practice for summarizing the state of evidence in software engineering. Conducting systematic reviews is laborious, especially during the screening or study selection phase, where the number of papers can be overwhelming. During this phase, papers are assessed against inclusion and exclusion criteria based on their titles and abstracts. Recent research has demonstrated that large language models (LLMs) can perform title-abstract screening at a level comparable to that of a master's student. While LLMs cannot be fully trusted, they can help, for example, in Rapid Reviews, which try to expedite the review process. Building on recent research, we developed AiSysRev, an LLM-based screening tool implemented as a web application running in a Docker container. The tool accepts a CSV file containing paper titles and abstracts. Users specify inclusion and exclusion criteria. One can use multiple LLMs for screening via OpenRouter. AiSysRev supports both zero-shot and few-shot screening, and also allows for manual screening through interfaces that display LLM results as guidance for human reviewers.We conducted a trial study with 137 papers using the tool. Our findings indicate that papers can be classified into four categories: Easy Includes, Easy Excludes, Boundary Includes, and Boundary Excludes. The Boundary cases, where LLMs are prone to errors, highlight the need for human intervention. While LLMs do not replace human judgment in systematic reviews, they can significantly reduce the burden of assessing large volumes of scientific literature. Video: https://www.youtube.com/watch?v=jVbEj4Y4tQI Tool: https://github.com/EvoTestOps/AISysRev


Discrete Minds in a Continuous World: Do Language Models Know Time Passes?

arXiv.org Artificial Intelligence

While Large Language Models (LLMs) excel at temporal reasoning tasks like event ordering and duration estimation, their ability to perceive the actual passage of time remains unexplored. We investigate whether LLMs perceive the passage of time and adapt their decision-making accordingly through three complementary experiments. First, we introduce the Token-Time Hypothesis, positing that LLMs can map discrete token counts to continuous wall-clock time, and validate this through a dialogue duration judgment task. Second, we demonstrate that LLMs could use this awareness to adapt their response length while maintaining accuracy when users express urgency in question answering tasks. Finally, we develop BombRush, an interactive navigation challenge that examines how LLMs modify behavior under progressive time pressure in dynamic environments. Our findings indicate that LLMs possess certain awareness of time passage, enabling them to bridge discrete linguistic tokens and continuous physical time, though this capability varies with model size and reasoning abilities. This work establishes a theoretical foundation for enhancing temporal awareness in LLMs for time-sensitive applications.


Evidence of conceptual mastery in the application of rules by Large Language Models

arXiv.org Artificial Intelligence

In this paper we leverage psychological methods to investigate LLMs' conceptual mastery in applying rules. We introduce a novel procedure to match the diversity of thought generated by LLMs to that observed in a human sample. We then conducted two experiments comparing rule-based decision-making in humans and LLMs. Study 1 found that all investigated LLMs replicated human patterns regardless of whether they are prompted with scenarios created before or after their training cut-off. Moreover, we found unanticipated differences between the two sets of scenarios among humans. Surprisingly, even these differences were replicated in LLM responses. Study 2 turned to a contextual feature of human rule application: under forced time delay, human samples rely more heavily on a rule's text than on other considerations such as a rule's purpose.. Our results revealed that some models (Gemini Pro and Claude 3) responded in a human-like manner to a prompt describing either forced delay or time pressure, while others (GPT-4o and Llama 3.2 90b) did not. We argue that the evidence gathered suggests that LLMs have mastery over the concept of rule, with implications for both legal decision making and philosophical inquiry.


Beware of these 7 new hacker tricks -- and how to protect yourself

PCWorld

Following the huge wave of ransomware last year, there's now increasing reports of completely new tricks used by hackers and cybercriminals to gain access to computer systems, devices, and networks. Many of these tricks exploit existing vulnerabilities in applications and operating systems, but these perpetrators are also developing completely new approaches that combine technical procedures with social engineering to achieve their goals. To recap if you're unaware: social engineering is when a malicious person exploits you through helpfulness, trust, fear, or respect in an attempt to manipulate you into doing something. Examples of social engineering include: a work email purporting to come from your boss with a payment order for a large sum to a foreign account; a WhatsApp message from someone pretending to be your relative in need of money; or a phishing email that claims to be your bank asking you to click a link with scary consequences if you don't. Here are some of the latest scams and techniques used by criminals that you need to know about--and how you can protect yourself.


A Sim2Real Approach for Identifying Task-Relevant Properties in Interpretable Machine Learning

arXiv.org Artificial Intelligence

In the context of human+AI interaction, explanations of the underlying function can provide additional information to assist the human in performing their task. Recent literature suggests that explanations with different properties are useful for different tasks [Liao et al., 2022, Lai et al., 2023, Chen et al., 2023, Jesus et al., 2021, Wang et al., 2019, Liao et al., 2020, Lim and Dey, 2009]. For example, in an AI-auditing task, the user may need to check whether the AI inappropriately relied on a forbidden feature, such as using gender in computing a credit score [Kaur et al., 2020, Hase and Bansal, 2020a, Lakkaraju et al., 2019]. In this case, we would want explanations that are faithful; that is, they reliably capture the underlying behavior of the function. On the other hand, suppose our goal is to help a user quickly understand the process by which a function produces its output; we can quantify the user's understanding by measuring the user's ability to approximate the function's output, given the input and an explanation [Hase and Bansal, 2020b, Chandrasekaran et al., 2018]. In this case, we may want explanations with low complexity, so that the user can effectively reason using the explanation in a limited amount of time.


Modelling human logical reasoning process in dynamic environmental stress with cognitive agents

arXiv.org Artificial Intelligence

Modelling human cognition can provide key insights into behavioral dynamics under changing conditions. This enables synthetic data generation and guides adaptive interventions for cognitive regulation. Challenges arise when environments are highly dynamic, obscuring stimulus-behavior relationships. We propose a cognitive agent integrating drift-diffusion with deep reinforcement learning to simulate granular stress effects on logical reasoning process. Leveraging a large dataset of 21,157 logical responses, we investigate performance impacts of dynamic stress. This prior knowledge informed model design and evaluation. Quantitatively, the framework improves cognition modelling by capturing both subject-specific and stimuli-specific behavioural differences. Qualitatively, it captures general trends in human logical reasoning under stress. Our approach is extensible to examining diverse environmental influences on cognition and behavior. Overall, this work demonstrates a powerful, data-driven methodology to simulate and understand the vagaries of human logical reasoning process in dynamic contexts.


Heuristic Satisficing Inferential Decision Making in Human and Robot Active Perception

arXiv.org Artificial Intelligence

Inferential decision-making algorithms typically assume that an underlying probabilistic model of decision alternatives and outcomes may be learned a priori or online. Furthermore, when applied to robots in real-world settings they often perform unsatisfactorily or fail to accomplish the necessary tasks because this assumption is violated and/or they experience unanticipated external pressures and constraints. Cognitive studies presented in this and other papers show that humans cope with complex and unknown settings by modulating between near-optimal and satisficing solutions, including heuristics, by leveraging information value of available environmental cues that are possibly redundant. Using the benchmark inferential decision problem known as ``treasure hunt", this paper develops a general approach for investigating and modeling active perception solutions under pressure. By simulating treasure hunt problems in virtual worlds, our approach learns generalizable strategies from high performers that, when applied to robots, allow them to modulate between optimal and heuristic solutions on the basis of external pressures and probabilistic models, if and when available. The result is a suite of active perception algorithms for camera-equipped robots that outperform treasure-hunt solutions obtained via cell decomposition, information roadmap, and information potential algorithms, in both high-fidelity numerical simulations and physical experiments. The effectiveness of the new active perception strategies is demonstrated under a broad range of unanticipated conditions that cause existing algorithms to fail to complete the search for treasures, such as unmodelled time constraints, resource constraints, and adverse weather (fog).


Who Wrote this? How Smart Replies Impact Language and Agency in the Workplace

arXiv.org Artificial Intelligence

AI-mediated communication is designed to help us do our work more quickly and efficiently. But does it come at a cost? This study uses smart replies (SRs) to show how AI influences humans without any intent on the part of the developer - the very use of AI is sufficient. I propose a loss of agency theory as a viable approach for studying the impact of AI on human agency. This theory focusses on the transfer of agency that is forced by circumstances (such as time pressure), human weaknesses (such as complacency), and conceptual priming. Mixed methods involving a crowdsourced experiment test that theory. The quantitative results reveal that machine agency affects the content we author and the behavior we generate. But it is a non-zero-sum game. The transfers between human and machine agency are fluid; they complement, replace, and reinforce each other at the same time.